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PROBABILITY 
Section I 



Host people find it difficult to think about chance events • Let’s 
see i£ we can agree on an owe r 3 to the following questions: 

Question Is If we drop a nail from a height of about 4 ft*, will it land 
in a “point-up** position. 



or in a “point-down** position? 





Question 2: If we drop a thumbtack from a height of about 4 ft, will it 

land in a “point-up” position. 




or in a “point-down** position? 




Question 3: Alan says that if we drop a thumbtack 20 times, from a 

height of about 4 ft*, it will land “point-up” 7 times, and “point- down” 13 times* 
Do you agree? 



Perform in g an Experiment and Recording Dat a 



a* 



N 




In dropping a thumbtack 20 times, and recording the outcomes, we are 

performing an experiment* It is usually desirable to drop the thumbtack in nearly 

§ 

the same way each time* He can do this by setting the tack "point-up” on a desk, 
and slowly pushing it off the desk by means of the edge of a book* You can think 
of many other ways to achieve uniformity: for example, you can rest your fore-arm- 

on a desk and hold the tack in a paper cup, then turn the cup quickly upside down, 
so that the tack falls out* 

The way that you record your data is also important. You want to pre- 
serve as much of the data as possible, so that you can use it to answer new questions 
that may arise in the future* One way to do this. In the thumbtack experiment, is to 

use the letter ”TJ” to mean "point-up” and tc^use the letter ”D” to mean "point-down”* 

- * \ 

.. 
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Record each outcome ir. the orJei of occurrence, grouping the symbols in groups of 5, 
so that your record for 20 drops might look like this; 

VUUDU 
D U U D D 
D U U D D 
DUUDU 

Question 4: Jerry says that you can*t be sure of the outcome when you 
drop a tack 20 times, because 20 is too small a number. He says that Alan could 
guess the outcome more closely if we used 40 drops. Instead of 20, What do you think? 

A Big Experiment 

In order to answer Jerry* s question, Alex suggested a co-operative 
experiment by the entire class. 

Each person dropped a tack 20 times, and recorded the "U*s" and ,, D , s M 
in the order in which they occurred, grouping them into groups of 5« 

After this data had been recorded, the class tried to decide whether it 
was easier to guess the outcome for 20 drops or for 40, 

They decided that part of the problem was the question of "consistency" 
or "stability". Here is what Alex did with the data recorded by Marilyn, Jerry, 

Harold and Ellen, Their original data looked like this: 



Marilyns 


DDCUU 


Jerry: 


DUDUU 




D D D U D 




DUUDU 




UDDVD 




U U U U D 




UDUUD 




U U U U U 


Harold: 


U U D U D 


Ellen: 


UDUUD 




U U D U D 




U U U U U 




DDDDU 




D D a U U 




DUDDD 




D D D U D 



Alex made 4 groups of 20 drops, as follows: 
Marilyn 12 "downs" and 8 "ups" 

Jerry 5 "downs" and 15 "ups" 

Harold 11 "downs" and 9 "ups" 

Ellen 8 "downs" and 12 "ups": 



3 .. 



Alex was trying to see how much variation there was in the number of times the 
tack landed " point* up" in 20 drops* 

Marilyn C "ups" 

Jerry 15 "ups' 1 

Harold 9 "ups" 

Ellen 12 "ups" 



Question 5: Do you think that these numbers vary so much that it is 

hopeless to try to guess the number of U's that will appear in 20 drops? 

Alex combined Marilyn's and Jerry's data, to get a group of 40 drops; 

Marilyi and Jerry: 17 "downs" and 23 "ups" 

Combining Harold and Ellen's data, he made another group of 40: 

Harold and Ellen: 19 "downs" and 21 "ups" 

In order to get 2 more groups of 40 drops each, Alex used the data recorded by 4 
other members of the class: 



Tony: 



Nancy: 



U U U U D 
UDDUU 
UUUUD 
UDDUU 
D D U D U 
D U U U U 
U U U D D 
U D U U U 



Richard: 



Susan: 



Tony and Richard; 



Nancy and Susan: 



6 13 
14 + 7 
7+9 
13 + 11 



19 "downs" 
21 "ups" 

16 "downs" 
24 "ups" 



D U D D D 
D U D D D 
D U U D U 
D U D U D 
U U U D D 
D U D U U 
U u D D U 
U U D D D 



Question 6: For 20-drop groups , the number of "ups" in each of 4 



groups were: 



8, 15, 9, 12 



For 40-drop groups, the number of "ups" in each of 4 groups were: 23, 21, 21, 24* 

Which would you rather try to predict, the outcome for a group of 20 drops, or 
the outcome for a group of 40 drops? 

Question 7: Jerry says there is not enough data here to be convincing* 

Can you suggest a way to get more data? 



Here is the data taken by other members of the class: 



Joan: 


U U D D U 


Jim: 


DUUDD 




DUUDU 




D D U U D 




UDUDU 




U D D U U 




VUDDU 




U D D U U 


Francis: 


DDUDU 


Tony: 


D D D U D 




DUUDD 




U D U D D 




DUDDV 




U D U D D 




DUDDU 




U D D D U 


Marge: 


DUDUD 


Rene: 


U U U U U 




U D D D U 




D D D D D 




DDVUU 




D D U U U 




DUDDU 




U U D U D 


George: 


D D D U D 


Steve: 


U U U D U 




U D U U U 




U U D U D 




D D D U D 




DUUDU 




U U U D U 




U D D D U 


Jeff: 


UDUDU 


Mary: 


U D D D D 




D U D D D 




D U U U D 




U D D U D 




D U D U D 




D D U D D 




D U U U D 


Ann: 


U D U U D 


Jake: 


D D D D U 


* 


U U D U D 




U U U U U 




D D D U U 




D D D U U 




D U U U U 




D U U U U 


Jerry row 


made up 10 groups 


of 5 drops each. 


using the first five drops 



from the first 10 students: 

Number of "ups * 1 in each group of 5 drops.* 
2, 3, 3, 3, 4, 1, 2, 3, 3, 2 



o • . 
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Section IX 



Permanent Experiment # 1 

tJhy don’t you perform a big thumb-tack experiment with your class? 

If you have 20 or more people in class, have each person drop a tack 40 
times. Have him record each "Up" or "Down” as it occurs, and separate 
his answers into groups of 5 each. Thereby combining groups, you will be 
able to get 10 groups of 5, or 10 groups of 10, or 10 groups of 20, or 
10 groups of 40, or 10 groups of CO# 

Keep all vour data # We will be able to make use of it again 
in the future. 
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2. 

Is the number of U 's more predictable in s large sample, 
or in a small scrapie? 

A* Record the number of U's in each of 10 groups of 5: 

B# Record the number of U's in each of 10 groups of 10: 

m m m — a , _ • - • 

Cm Record the number of U'o in each of 10 groups of 20: 

D, Record the number of U's in each of 10 groups of 40: 

u C | • • - _ , - • m J ■ — m ri ^ , • 

E* Record the number of U's in each of 10 groups of 80: 

• • * * mm ■ • • , * 

Question 1 : Looking at your data above, where is it easier to predict 

the number of U'fl^ in groups of 5 or in groups of 20, or in groups of 80? 

Question 2 : Can you describe what we mean by the "variability" in a set of 

numbers? Which set of numbers shows the greatest variability, the set 
recorded under A, or the set recorded under C, or; the set recorded under E? 

We need some good methods for studying how much "variation" there 
is in a set of numbers. Here are 5 methods:* 

1, We shall take our data from Section 1* Why don't you use data 
from the experiment that your class did. 
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The Method of "Just Looking". 

For groups of 5, we got these numbers: (counting "Ups") 

2( 3, .3, 3, 1. 2, 3, 3, 2 , 

For groups of 20, we got these numbers: 

14, 7, 12, 12, 10, 10, S, 10, 11, 11. 

By just looking at these numbers, which set of numbers seems to 
show greater" variation"? 



The Method of Graphs. 



Ue can show the first set of numbers of a graph like this; 




A 

■ number of 
occurrences 



X * 



1 2 3 4 5 6 7 



10 11 12 13 14 15 16 17 



a P TTt c 4 






4 



ywi'ss .r£J.‘S ftps*:. 



From looking at these two graphs, which set of numbers seems to show 
greater variability? 

III. The Method of Mean Absolute Deviation from the Mean. 

One good method requires that we compute the ’'average*' or '•mean* 
for each set of numbers: 



2 * 3 * 3 * 3 * 4 * 1 - V - 2 * 3 * 3 * 2 « 26 



26 . 

10 



2.6 



14 * 7 - l - 12 * 1 - 12 * 10 * 10 * 9 * 10 V 11 * 11 B 106 



106 

10 



10.6 



lie then compute the distance (on the number line) between each 



number and the mean: 

|2 - 2 . 6 ] = 0.6 
(3 - 2.6 J » 0.4 
|3 •» 2.6 1 ** 0.4 

|3 - 2.€f = 0.4 ( 

|4 - 2.6] » 1.4 
|l - 2.6] - 1.6 

J 2 - 2 . 6 ] - °* 6 

|3 ■ 2.6 1 ** 0.4 
|3 - 2.6j * 0.4 
|2 - 2.6 1 « 0.6 

We have now computed the "deviations from the mean" for our first set of 
numbers. We now proceed to compute the average deviation by averaging these 
new numbers: 



0.6 * 0.4 * 0.4 * 0.4 * 1.4 * 1.6 * 0.6 + 0.4 * * 0.4 * 0.6 ■ 6.8 



6.8 

10 



0.68 
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This is the mean absolute deviation from the first set of numbers (groups 
of 5 drops) * 

Now, let's do the same thing with our second set of numbers (groups of 20 
drops) : 



ll4 


- 10.6 


| « 3.4 


|7 - 


10.6 1= 


> 3.6 


|12 


- 10.6j 


* 1.4 


Il2 


- io.e| 


a 

• 


1 10 


- 10.6| 


= 0.6 


|io 


1 

H 

o 

• 

CN 


- 0.6 


|9- 


io.e|» 1.6 


JlO 


t 

© 

• 


» 0.6 


1“ 


- lo.ej 


= 0.4 


I 11 


- 10.61 


= 0.4 



3.4 + 3.6 * 1.4 * 1.4 * 0.6 * 0.6 * 1.6 •:* 0.6 * 0.4 * 0.4 =* 14 

-14 

10 

This is the mean absolute deviation for the second set of numbers 
(groups of ^20 drops). 

From this method of comparison, which set of numbers seems to vary more? 



IV. The Method of Comparing Ranges. 

For the first set of numbers 
2, 3, 3, 3, 4, 1, 2, 3, 3, 2 

the smallest is 1 and the largest is 4. The range , therefore, is 
4-1*3 

For the second set of numbers 

14, 7, 12, 12, 10, 10, 9, 10, 11, 11, 

the smallest is 7 and the largest is 14. The range, therefore is: 14 - 7 * 7. 
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From comparing the ranges , which set of numbers seems to show the greater 

* m ' 11 T 

variability? \ ~ 

\ 

\ 

\ 

V# The Method of Comparing TriWed Ranges. 

To use this method, we arrange the numbers in order of sise: 

1, 2, 2, 2, 3, 3, 3, 3, 3,\ 4 

7, 9, 10, 10, 10, 11, 11, 12, 12, 14. 

We then "trim'** each set by discarding (say) the two "largest" and 
the "smallest" numbers in each: 

2, 2, 3, 3, 3, 3 

10 , 10 , 10 , 11 , 11 , 12 

For these "trimmed" sets of numbers, we compute the ranges: 

3 - 2 f = 1 trimmed range for first set of numbers (groups of 5) 

12 - 10 » 2 trimmed range for second set of numbers (groups of 20). 
By using this method of comparison, which set of numbers seems to show 
greater variability? 

Question 3 : Which set of numbers, in your data, shows greater variability, 
the set recorded under C, or the set recorded under E? 

Question 4; Can you predict the total number of "Ups" more accurately in 
small numbers of tosses, or in large numbers of tosses? 

Question 5 : If we want to get a set of numbers showing twice as much 

variability, should we use sample sizes twice as large? One-half as large? 

four times as large? One-fourth as large? Or what? 

I 

Question 6 : Do you know how mathematicians express the answer to Question 5? 

Question 7 : What advantages and disadvantages cariyou find to help choose 
between the 5 different methods for comparing variability? 
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Question G : Jerry says that, even though the second set of numbers seemed 
to show more variability, there is some sense in which it really shows less 
variability* What do you think? How would you suggest we deal with these 
two sets of numbers? 
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Section III 



Proportional Occurrence of U's 

Question 1} Ellen say s that even though the total number of U's is harder 
to predict for larger samples, the proportional occurrence of U's is easier 
to predict for larger samples, What do you think? What does your data 
suggest? 

Let's test Ellen's suggestion by each of our 5 methods for comparing 
variability. In Section II we compared the variability of the total number 
of U's, We new compare the variability of the proportional or fractional 
number of U's, 

Question 2: How do you expect the variability of the fractional number of 

U's in the 5 drop case will compare with the fractional number of U's in 
the 20 drop case? 

Method I: * 

The fractional number of U's in the 5 drop case can be found by taking the 

total number of U's: 

2 t 3, 3, 3, 4, 1, 2, 3, 3, 2 

and dividing by the total number of drops (in this case, 5): 

2, 3, 3, 3, 4, 1, 2, 3, 3, 2 
5 5 53555555 

For the groups of 20 drops we get 

n,_7, i&» 12* 10 , ii» II 

20 20 20 20 20 20 20 20 20 20 

Can you tell, by just looking, which set of numbers varies more? 

Method II: Comparison by Graphs, 

We shall mark both sets of numbers on the same graph, using x's to 
indicate the 1st set, and O' 8 to indicate the 2nd set: 



3 . 



/ 0.8 - 0.52 | » 0.20 
/ 0.2 - 0.52) = 0.32 
\(i.4 - 0.52 1 = 0.12 
\0.6 - 0.521 ■ 0.08 
|0.6 - 0.52) • 0.00 
J0.4 - 0.521 « 0.12 

Averaging these deviations, we get 

<• 

0*12 * 0.08 * 0.08 *0.03 + 0.28 * 0.32 * 0.12 * 0.08 * 0.08 * 0.12 



1.36 



1.36 

10 



0.136 ^ — This is the mean absolute deviation for the 1st set of 
numbers (group of 5 drops). 



We can now do the same thing for the 2nd set of numbers (groups of 20 

drops) : 

The mean is 

10.6 _ 0 53 

20 °* 5 * 

The absolute deviations from this mean ares 



lo.7 - 0.53j = 0.17 



|0.35 • 


- 0.53) 


i » 0.1c 


a 

0 

. 

0 


0.53 1» 0.07 


/0.6 » 


0.53/ 


* 0.07 


/0.5 - 


0.53/ 


= 0.03 


/ 0.5 - 


0.53/ 


• 0.03 


1 0.45 


- 0.53) 


- 0.08 


| o.s - 


o.ssj- 


= 0.03 


Jo. 55 


a 

0 

. 

U1 


«= 0.02 


{0.55 


- 0.53 


f » 0.02 



4 . 
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0.17 -J- 0.18 0.07 •:* 0.07 + 0.03 *;• 0.03 *:-0.08 + 0.03 + 0.02 + 0.02 * 0.7 



0.7 

10 



» 0.07 — 



This is the mean absolute deviation for the 2nd set of 
numbers (group of 20 drops, using proportion of U's 
rather than total number of U's). 



Which set of numbers seems to vary more? How much more? 



IV. The Method of Comparing Ranges 

The 1st set of numbers (proportion of U's in group of 5 drops) is 

0.2, 0.4, 0.4, 0.4, 0.6, 0,6, 0.6, 0.6, 0.6, 0.8 



The smallest number is 0.2, and the/ largest is 0.8. Consequently, 






the range is 0.8 - 0.2 » 0.6. 

The 2nd set of numbers (proportion of U's in groups of 20 drops) is 
0.35, 0.45, 0.50, 0*50, 0.50, 0.55, 0.^5 , 0..60, 0.60 , 0.70, 

The largest is 0.70, nnd the smallest is 0;35. Consequently the 
range is 0.70 - 0.35 « 0.35. 



V# The Method of “Trimmed" Ranges 

For the 1st set of numbers, we delete the two largest and the two smallest. 



to get a "trimmed" set of numbers; 



0.4, 0.4, 0.6, 0.6, 0.6, 0.6. 



The range is now 0.6 - 0.4 ® 0.2. 



For the 2nd set of numbers (groups of 20 drops) , if we omit the 2 larges t 
and 2 smallest we get the "trimmed" set of numbers: 

0.50, 0.50, 0.50, 0.55, 0.55, 0.60 

The range of thisr'triccad' set is 0.60 - 0.50 *= 0.10. 

Question #3; Does the total number of U's vary more in large samples, or in 
small samples? 

Question #4; Does the proportion of U's vary more in large samples, or in 
small samples? 



Question: #5: Can you summarize what we have learned? What does your data 

seem to indicate? . 

2926-66 
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Section IV 



Variability of Total Number of U's, and of Proportion of U's, in Large 
Samples and Small Samples* 

(Summary of Sections I- III) 

Alex says that mathematicians talk about our thumb** tack experiment this way; 

When we were using 10 groups of 5 drops each, they would say we had a 
sample size n, equal to 5. 

When we were using 10 groups of 20 drops each, they would say we had 

• '' 1 

a sample size n, equal to 20. ) 

In general, when we increase the sample size by making it 4 times as big, 
the variability of the total number of U's would be ^expected to increase by 
a factor of 2* Consequently, mathematicians say that the variability of 
the total number of U's increases as VST • 

, t- 

-In • the fractional proportion of U's, however, the situation is quite 

different* Here, if we multiply the sample size by 4, the variability of 

the fractional proportion of U's decreases by a factor of 2* Consequently, 

1 

mathematicians say that the fractional proportion of U's decreases like 

Question 1* Is this what your data seemed to indicate? 

Question 2: Could you come closer, in predicting the number of U's, from 
a small number of drops, or £rom a large number of drops? 

Question 3* Does your data become more variable or less variable, as you go 
to larger* sized samples? 

Question 4: Can you summarize what we have learned? 

Question. 5: Why do you think we use fractions so much in the theory 'of 
probability? 

* 

2955-66 

i 

hi rr Tii—Tn T’ 177 rTiTi < iT"iT 7 i mTmi if A 



Section V 



A Telephone Book Experiment 

Experiment II. Look at some "randomly chosen'* page well into the phone 
book. Make a record of the last digit of the 1st 40 numbers of the page, 
grouping by fives as usual. Each student should collect this data inde- 
pendently, so that we can combine into a "big experiment." From this record, 
determine the frequency of occurrence of each digit, and the relative fre- 
quency of each. Study the variability of these frequencies as a function 
of sample size, as was done in Experiment I. 

Here is some typical data (although you will undoubtedly want to work 
with data collected by your own class) : 

Harold 1 o data: 

4, 9, 5, 6, 3 i 

5, 6, 5, 4, 4 
2# 9j 4, 2, 2, 

8 , 0 , 8 , 4 , 3 

4, 4, 0, 5, 6 
0, 9, S, 0, 8 

0, 2 f 0, 7 ) 3 

i 

0» If 3f 4, 6 

Judy's data: 

3, 3, 4, 7, 0 

5, 3, Of 9, 6 
2$ 8f 7, 9, 4 
C, 5f 9, 6, 4 

9, 5, 9, 3, 9 
8, 9, 9, 6, 7 

1, 7, 7, 9, * 

If 6, 6, 8, 8 



o 




Using only Harold's and Judy's data we find: 



digit 


total number of occurren.. 


xelative proportion of occurrences 


0 


o, 0» 1, 1 , 2 , 1) 0, 1 


0. 0t 0, 1, 1, 1, 2, 1, 0, 1 

5 5 5 5 5 3 


1 


o, o, 0, 0, 0, 0, 0, 1, 0, 0 


0, 0, 0, C, 0, 0, 0, 1, 0, 0 

5 


2 


o, 0, 3, 0, 0, 1, 2, 1, o, 1 


0, 0, 3, 0, 0, 1, 2, 1, 0, 1 
5 5 5 5 5 


3 


1. 0, 0, 1, 0, 0, 1, 0, 2, 1 


!• 0, 0, 1, 0, 0, 1, 0, 2, 1 
5 5 5 5 5 


4 


2f 1, 1 } 2 } 0, 0, 1 } 0 


1. 2, 1, 1 , 2, 0, C, 1, 1, 0 
5 5 5 5 5 5 5 


5 


^ i 2» Op Of l) 0, 0| 0) 1 


1, 2, 0, 0, 1, 0, 0, 0, 0, 1 




2f 0» I* 0, 0, 1, 0, 1 


5 5 5 5. 


6 


A* li 0 f 0, 0, 1) Of 1 

5 5 5 5 5 


7 


0, 0, 0, 0, 0, 0, 1, 0, 1, 0 


0, 0, 0, 0, 0, 0, 1, 0, 1, 0 

5 5 


8 


o, 0, 0, 2, 0, 2, 0, 1, 1, 0 


0, 0 ( 2, 0, 2j 0> If If 0 

5 5 5 5 


9 


1, 0, 1, 0, 0, 2, 0, 0, 0, 1 


Of if Of Of if Of Of Of 1 



5 5 5 5 

In order to study 10 groups of 10, we need more raw data. Here is 
Marilyn's data: 

9, 2, 0, 4, 3 
7, 7, 7, 4 
6, 1, 4 

3, 3, 9, 7, 9 
7, G, 1, C, 0 
7» 9, 4, 1, 6 

4, 1, 4, 9, C 
4, 4, 1, 3, 7 



aassss 
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digit 

0 . 

1 

2 

3 

4 

5 

6 

• • 

7 

8 
9 



Occurrence o of Digits in E ac h of 10 groups of XO i 
total number of occurrences relative proportion of occurrences 



0,1, 2, 3, 1, 0, 0, 0, 1, 0 
0 „ 0, 0, 1, 0, 0, 0, 2, 1, 0 

0, 3, 0, 1, 0, 1, 0, 0, 1, 1 

1, 3, 0, 1, 0, 1, 2 
3, 2, 2, 1, 1, 2, 0, 0, 2, 1 

3 j 0 j 1) 0, X 9 X 9 O9 O9 0 

2, 0, 1, 1, 1, 1, 1* 2, 0, 1 
0 , 0 , 0 , 1 , 1 , 1 , 1 , 3 , 3, 1 
0 , 2 , 2 , 1 , i, 2 , i t 2 , 1 , 0 

2 , 0 , 1 , 2 , 4 , i, 1 , 2 



Oil. 1, 3, 1, 0, 0, 0,1, 0 
10 10 10 10 10 

0, 0, 0, 1, 0, 0, 0, 2, 1, 0 

To To To 

0, 3, 0, 1, 0, 1, 0, 0, 1, 1 

10 10 10 10 10 

JL, i> 0, i> 3, 0, JL, 0, JL, 2 

10 10 10 10 10 10 10 

39 29 29 X 9 X 9 29 O 9 O 9 29 X 

10 To To To To To To To 

3, 0, 1, 0, 1, 1, 1, 0, 0, 0 

To 10 To To To 

2> 0, ,1} JL, JL , JL, JL, 2, 0, JL. 

10 10 10 10. 10 10 10 10 

0 , 0, 0, JL, 1, JL, 1, 3, 3, JL 
10 10 10 10 10 10 10 

0, 2, 2, JL, JL, 2, J,, 2, JL, 0 

10 10 10 10 10 10 10 1 C 

1, JL, 2 , 0, JL, 2, 4, JL, 1, 2 

10 10 10 10 10 10 10 10 10 



In order to consider 20 groups of 20 numbers each (“samples with n 
equal to 20 ,: ) , we need more data: 

Tom’s data: 

5, 8, 4„ 5, 0 

7, 5, 3, 3, 0 

0, 8, 0, 9, 1 

9* 

8, 2, 4, 0, 8 
2, 5, 4, 2, 5 

1, 1, 9, 7, 4 
1, 9, 6, 6, 7 
8, 9, 7, 6, 7 




rum — .-J M-.rm , u 



•fc* 



i 



m 



i— i 



»' 
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Bills data; 



0, 4, 6, 2, 3 
0, 2, 5, 7,* 2 



0, 0, 5, 5, 1 



0, 7, 5, S, 9 



9» 0, 2, 4, 7 
7» 0, 8| 4, 1 
4, 2, 1, 4, 9 
0, 3, 1, 4, 4 



Occurrences of Digits in Each of 10 Groups of 20 



Digit total number of occurrences 

0 1, 5, 1, 0, 1, 2, 5, 0, 5, 3 

1 0, 1, 0, 2, 2, 4, 1| 3| 1) 3 

2 3, 1, 1, 0, 2, 0, 1, 2, 2, 2 

3 2, 1, 3, 1, 3, 1, 2, 0, 1, 1 

4 5, 3, 3, 0, 3, 4, 2, 2 , 1, 6 

5 3, 1, 2, 1, 0, 0, 3, 2 , 4, 0 

6 2, 2*9 2, 2, 1, 1, 0, 3, 1, 0 

7 0, 1, 2, 4, 4, 3, 1, 3, 2, 2 

^ 2j 3| 3, 3, I9 2) 4j I9 I9 1 

9 2, 2, 3, 69 3, 2, 1, 3, 1, 2 



relative proportion of occurrences 

1, 5, 1, 0, 1, 2, 5, 0, 5, 3 

20 20 20 20 20 20 20 20 

0» O9 2 t 2 t 4 9 1, 3J, l.j 3 

20 20 20 20 20 20 20 20 

39 I9 I9 O9 2) O9 If 29 29 2 

20 20 20 20 20 20 20 20 

2, 1, 3, l p 3, 1, 2, 0, 1, 1 

20 20 20 20 20 20 20 20 20 

59 3 ) 39 O9 39 49 29 29 I9 0 

20 20 20 “ 20 20 20 20 20 20 

39 I9 29 I9 O9 O9 39 29 49 0 

20 20 20 20 20 20 20 

2 9 2 t 2, 2, l* 1, O9 3 t 1, 0 

20 20 20 20 20 20 20 20 

0, 1. 2, 4, 4, 3i 1, 3,2,2 
20 20 20 20 20 20 20 20 20 

2 , 3| 3, 3, l t 2 , 4, 1, _1 

20 20 20 20 20 20 20 20 20 20 

2, 2, 3, 6, 3, 2, 1, 3, 1, 2 

20 20 20 20 20 20 lo 20 20 20 
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This table does not seem to show very decisive agreement without 
generalization about variability* What do your data show? 



Method of Trimmed Ranges 

If we delete the two largest and two smallest members from each set, we 



get 



trimmed set for total no* of occurrences 



digit 

0 

1 

2 

3 

4 

5 

6 
7 
C 
9 



group with n~5 

0 , 0 , 1 , 1 , 1 , 1 

0, 0, 0, 0, 0, 0 

0 , 0 , 0 , 1 , 1 , 1 

0, 0,0, 1, X, X 

Of 1» 1* 1» 1 

0, 0, 0, 0, 1, 1, 

0, 0, 0, 1, l, 1 

0, 0, 0, 0, 0, 0 

0, 0, 0, 0, 1, 1 

Of Of 0, Of 1, 1 



group with n=20 
1, 1, 1, 2, 3, 5 

1, 1, 1, 2, 2, 3 

1 , 1 , 1 , 2 , 2 , 2 

1) 1» 2, 2 

2, 2, 3, 3, 3, 4 

Of If If 2f 2, 3 

1, 1, 1, 2, 2, 2 

2f 2, 2f 3, 3 

1, 1, 2, 2, 3, 3 

2, 2f 2f 2, 3f 3 



For the trimmed ranges we get: 

trimmed range for total 



trimmed range for fractional 



digit 


no* 


of occurrences 


DroDortion of 


occurrences 




n»5 


tjp20 


n=*5 


n-2o 


0 


1 


4 


0*2. 


0.2 


1 


0 


2 


0 


0.1 


2 


1 


1 


0.2 


0*05 


3 


1 


1 


0.2 


0.05 


4 


1 


2 


0.2 


0.1 


5 


1 


3 


0.2 


0.15 


6 


1 


1 


0.2 


0.05 


7 


0 


2 


0 


0.1 


8 


1 


2 


0.2 


0.1 


9 


1 


1 


0.2 


0.05 



R 
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Method of Average Range 

Combining our data for all digits, we can compute the average range and 
average trimmed range as follows: 

£? . £ ££ Average Trimmed range 



Total no* of 
occurrences 


Proportional fraction 
of occurrences 


Total no* of 
occurrences 


Proportional 
fraction of 
occurrences 


n=20 


n=5 n=20 


n=5 n=20 


dfS n=20 


1.8 3.9 


0.36 .19 


«-* 

CO 

• 

0 


1.6 .09 


This table appears to fit in quite nicely with our generalization that the 
variability of total number of occurrences increases HkeVnT while the 



variability of t-he fractional proportion of occurrences decreases like 

\SvT 9 

as the sample size n increases* 

What do your data show? 
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Section VI 



An Experiment with a Coin 



Experiment XXX: Each member of your class can toss a coin AO times , recordin 

each occurrence of heads and tails in order* Keep these records in groups of 
5 tosses each* Keep this data permanentl y- — we can use it repeatedly in 
the future l You can study the variability of total number of heads, and 
fractional proportion of heads, as functions of the sample size n* 



Uhat do 


you 


expect to find? 


Here 


; is 


the ' 


record 


coins; 


1 






















H 


H 


B 


T 


H 




I 


H 


X 


X 


T 




X 


T 


T 


T 


H 




X 


B 


X 


K 


H 




H 


K 


H 


T 


X 




X 


X 


X 


H 


H 




11 


H 


H 


? 


X 




H 


X 


H 


H 


X 




H 


T 


T 


T 


I 




X 


H 


X 


X 


H_ 


_100 


H 


T 


R 


T 


B 




H 


H 


X 


I 


X 




H 


H 


T 


T 


I 




X 


X 


X 


X 


X 




H 


T 


T 


H 


X 




H 


X 


I 


X 


X 




H 


T 


X 


T 


E 




I 


X 


B 


H 


X 




H 


T 


T 


H 


X _ 


_50 


X 


X 


a 


I 


X 




T 


B 


H 


B 


B 




H 


X 


X 


H 


X 




H 


R 


B 


H 


K 




I 


I 


H 


X 


X 




H 


H 


T 


H 


I 




H 


X 


H 


X 


X 




T 


T 


R 


B 


X 




I 


B 


H 


X 


B 




T 


K 


T 


T 


I 




H 


X 


I 


X 


T_ 


L50 


B 


T 


T 


a 


X 




X 


I 


I 


I 


H 




H 


T 


T 


B 


K 




X 


H 


X 


I 


H 




H 


B 


B 


R 


I 




X 


X 


X 


H 


H 




H 


T 


T 


X 


B 




I 


X 


I 


H 


H 




H 


X 


B 


H 


a 




H 


X 


X 


X 


X 




Jk 


B 


B 


T 


X 




2 


X 


I 


I 


I 




X 


B 


T 


X 


B 




H 


I 


X 


I 


X 




R 


B 


T 


B 


B 




H 


X 


I 


B 


X 




H 


H 


B 


X 


H 




H 


I 


I 


I 


H 




H 


T 


T 


T 


I_ 


_200 


X 


H 


H 


X 




350 


You may want 


to 


get records 


of even 


more tosses; 



least 2,000* 
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HHHIB 




THETT 


1HTIH 




T H T H T 


T T T H H 


- * * 


BE i HH 


H1KIT 




.. T .T..T .H T 


H H T B T 




T T H T H 


T T T II H 




H H H H H 


H T II H H 




H H H X H 


HTHHH 




TTHBT 


T H I II H 




X H T X H 


HHT1H. 


_250 


BEEH X. 


IB1HH 




EIEET 


T H H T H 




XREBT 
I H X X X 


tit n 




H H T T T 




X I I II X 


IITBH 




BBBBB 


TBTIT 




BBEBB 


BTTBT 




H H K X H 


T T T H T 




X X X H I 


TBITT 




BTBBE 


IBHHB w 


_300 


H T X H H 


T T H T H 




BEEBE 


T H H T T 




H H X H H 


T T T H H 




HXTBE 


TBBHB 




IBTBH 


H t t ;ni 




X H H I X 


BBTHI 




HETEE 


BHTTH 




BBHTX 


TBTHH 




‘ 


H H H X T 




- . - - 


T H H T T 


500 


• 



400 



450 



(Section VX Is temporarily left incomplete* In the completed version* 
one would treat this data as In the preceding sections* studying empirically 
the variability of totals and ratios as a function of sample size*) 

(This coin data would also be used later for an empirical comparison 
of the "compensation* 1 vs# "swamping" theories of the law of large numbers.) 
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Section VII 









An Abstrcct Model for Chance Events 

In the preceding sections, ye have made empirical studies of variability, 
using thumbtacks, telephone 'directories, and coins* We have seen that as we 
make our samples larger, the variability of the total number of occurrences 
of (say) an "Up", or of a "head”, becomes larger* However, the fractional 
proportion of "Ups" or "heads" varies loss for larger samples* 

Can ye use this apparent stability of the fractional proportion of 
heads as the foundation for a mathematical model? We would like our model 
to help describe "chance" events* Let's see if we can make one that yill 
have some usefulness* 

To begin with, let us think of the example of the last digit of a 
telephone number* We can make a 2- dimensional graph by representing the 
possible outcomes along the horizontal axis 

/\ 

. ■ » , * ■■ ■ i — » — ■ — — — > 

0123456789 

and representing the fractional proportion of occurrences along the vertical 
axis* Suppose, for 2 numbers, the last digit of one was 7, and of the other 
vas 4* Then the fractional proportion of occurrences would be 

Digit 0 

Fractional 0 

Proportion 
of Occurrence 

and the corresponding graphical representation would be 

A 



123456789 

000 ^ 00%00 



1 



h 



A * 




■ »— . ■fc— . >. 

2 3 4 S 



6 



7 



jt ^ 

8 9 



v 




t 



Suppose the experiment of selecting 

digits were 2 and 4* This new graph 

A 



2 numbers were repeated, and the final 
would then look like this? 



h x x 

0 T 123456 7 C 9 



> 



We can make a 3-dimensional picture by arranging these two planes parallel 
at two different points on a "time' 1 axis: 




the outcome by a 3-dimensional picture as follows: 




From this picture, we can see that the outcome of the 1st toss was "Heads" , 
the outcome of the 2nd toss was "Heads", of the 3rd also "Heads", the outcome 
of the 4th toss was "Tails", and so on* 



How what did we seeni to be observing in our empirical studies of probability? 
For one thing, we computed the fractional average, not of a single toss, but 
cumulatively over many tosses* We took a fairly long section along the time axis 




and computed an average for all of the tosses included within this time interval* 
She resulting 2- dimensional graph might look like this: 




If we take longer aud longer sections along the time axis, the variability 
of the fractional proportion of occurences will become smaller and smaller* The 
fractions appear to bcl "homing in" on 6ome constant values, from which they do not 
deviate very much in large samples* 

We might, then base our model upon the idea of a long-range average 



which can represent, as an average, an extended section along the time axis: 




Q uestion I , What do you think a 2- dimensional "long-range average” would look 
like for; 

a) the thumbtack experiment 

b) the last digit of telephone numbers 

c) the coin-tos3ing experiment* 

Question II* If you computed a 2-dimensional graph of fractional occurrences 
front a very long average along the time axis, would your 2-dimensional graph be 
relevant to some other long average along the time axis? 



We evidently can get slightly different, but quite similar, graphs by averaging 

» 

over different long sections of the time axis* It is convenient to assume a "limiting 11 
graph towards which our long-range average graphs are tending* 

We can frequently use logical analysis to determine wliat this "limiting" graph 
should be* In the case of the coin- tossing experiment, we can argue that the coin 
is reasonably symmetric, and so each side should be as likely as the other* Conse- 
quently, we can expect a "limiting: graph like this: 



% 



t 



X 



Such logic, unfortunately, fails us in the case of the thumbtack, and we are 
forced to rely upon our long-range averages computed from empirical data* 

For the coin we have a good theory; for the thumbtacks we have none at all* 

The case of the last digit of the telephone numbers lies somewhere in between: we 
might believe that all digits are equally likely, on the grounds that the telephone 

* 

company uses essentially consecutive numbers without gaps* On the other hand, it is 
harder to be sure just how telephone numbers are assigned, and so we are less con- 
fident that all digits really are equally likely* It is, however, possible to com- 
pare our "equally-llkely" theoretical limit graph against graphs obtained empirically 
from long averages along the time axis. This comparison might be quite interesting. 



5 . 



We shall cake one further modification of our 2-diQcnsional limits” graph* 
The various outcomes of an experiment are usually things like ’'•heads** , ’‘tails'*, 
"point-up" , "point-downs, and so on. These outcomes do not naturally arrange 
themselves along a number line. We shall consequently dispense with the graphical 
arrangement, and shall concern ourselves only with the set of possible outcomes, 
which we shall call a sample space * 



2) If we toss a single die, it can come to rest showing, 1, 2, 3, 4, 

5 , or € on its uppermost face* We can represent this set of possible outcomes 



3) If ve toss one dime and one quarter, we can list the outcome in a 
definite order, giving the outcome for the dime first, then the outcome for 
the quarter* Thus, HT would mean the dime showed heads, the quarter showed 
tails* Using this convention, the sample space might be written 



4) If we throw two dice simultaneously, and care only about the total 
obtained by adding the two numbers on the uppermost faces, we might write 
the sanple space this way: 



£2, 3, 4, 5, 6, 7, C, 9, 10, 11, 12 J . 

5) For our thumbtack, the sample space might be written U, D, where 
11 V means the tack came to rest point-up, and "IV means that the tack came 



Examples: 



1) If we toss a coin once, the set of possible outcomes (or " sample 
space") might evidently be written ^ H,T X 



as 





to rest point- ’own. 



6 . 



lie have replaced our horizontal axis by a simple listing of the possible 
outcomes of an experiment, lie must, however, retain the numberal values which our 
2- dimensional limit graph exhibited along the vertical axis, lie shall do this 
by means of a function f whose range is a subset of tue set of real numbers. 

Examples: 

1) For our single coin experiment, the sample space is 



and f(d). Depending upon the kind of thumb-tack that you used, the surface onto 
which it fell, and the method of dropping it, you may get different ratios of 
U*0 and D's. If, in a drops you got bU'fi and a j > D's, then your estimated 
limit graph might result in this function: 



Question III. Even without knowing the actual experiment and the actual sample 
space that someone has in mind, can you describe certain limitations on the 
function f which he nust use? 




and the function f is defined as 



* (H) » h 

f (t) * h 



2) For the thumbtack experiment, use your own (data to determine f(U) 




ERIC 



The Use of Tree Graphs 



The task of deciding upon a sample space is sometimes simplified by using 
a "tree graph" * We can illustrate this method by an example:* 

Three-child families . To study the distribution of boys and girls in 
families having three children, a survey of such families is made. What is 
a sample space for the experiment of drawing one family from a population 
of three-child families? We can construct a "tree graph” like this: 



1st Coldest) 
child 



2nd child 




Boy 



3rd child 





Boy: 

4 

Girl 

Boy 

Girl 

Boy 

Girl 

Boy 

Girl 



Samp I e 
Space 



BBB 

E3G 

BGB 

6GG 

GBB 

GBG 

GGB 

GGG 



In the usual set notation, we could write the sample space as 
BEG, 8GB, BGG, GBB, GBG, GGB, GGGj>- 
Suggested continuation of Section VII 

1) Discuss "events’* as subsets of the sample space. 

2) Describe the function f, extending its domain to the set of subsets of 
the sample space. Include additive property. 



I. This example is quoted from Probability: A First Course , by Hosteller, 

Rourke and Thomas (Addison-Wesley, 1961), pp. 64, 65. 
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Section II 



Permanent Experiment # 1 

Why don't you perform a big thumb-tack experiment with your class? 
If you have 20 or more people in class, have each person drop a tack 40 
times. Have him record each "Up" or "Down" as it occurs, and separate 
his answers into groups of 5 each. Then, by combining groups, you will be 
able to get 10 groups of 5, or 10 groups of 10, or 10 groups of 20, or 
10 groups of 40, or 10 groups of SO, 

— all your data! We will be able to make use of it again 
in the future . 



o 
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Is thu number of U's more predictable in a large sample, 
or in a small sample? 

A. Record the number of U's in each of 10 groups of 5; 



.• . • - • • • 



,» — , , I L • . . > a 



B. Record the number of U's in each of 10 groups of 10: 



.• • — — • • • • 



.• • • 



C» Record the number of U'o in each of 10 groups of 20: 



■ • * • • » 



.• • • * 



D« Record the number of U's in each of 10 groups of 40; 



E, Record the number of U's in each of 10 groups of 30: 



,« • » a m 



• • • 



Question 1: Looking at your data above, where is it easier to predict 

the number of U's, in groups of 5 or in groups of 20, or in groups of 30? 

Question 2: Can you describe what we mean by the "variability" in a set of 

numbers? Which set of numbers shows the greatest variability, the set 
recorded under A, or the set recorded under C, or; the set recorded under E? 

We need some good methods for studying how much "variation" there 
is in a set of numbers* Here are 5 methods:* 



1» We shall take our data from Section X* Why don't you use data 
from the experiment that your class did. 



I. The Method of "Just Looking”* 

For groups of 5, we got these numbers: (counting ”Ups”) 

2 q 3 , .O' j 3, 4j 1} 2, 3, 2 . 

For groups of 20, we got these numbers; 

14, 7, 12, 12, 10, 10, S, 10, 11, 11„ 

By just looking at these numbers, which set of numbers seems to 
show greater” variation”? 

t 

; \ 

» 

llo The Method of Graphs « 

Me can show the first set of numbers of a graph like this: 




A 

number of 
occurrences 



X 

X * 

ax x 



u. — < tr x V * ■* 

1 2 3 4 5 6 



♦ * a _ . . -1^ r.... 

7 3 9 10 11 12 13 14 15 16 17 
number of U* s in each group 



0 



5 . 
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This is the mean absolute deviation from the first set of numbers (groups 
of 5 drops), 

Now, let's dc the same thing with our second set of numbers (groups of 20 
drops) : 

Jl4 - 10.6 j - 3.4 
|7 - 10,6 1-- 3.6 
|l2 - 10„6j= 1.4 
|l2 - 10.6|« 1.4 
; |l0 - 10.6 |= 0.6 
|l0 - 10.6 0.6 
|9 - lO.ej- 1.6 
|l0 - 10.6 |= 0.6 
| ii - io.e| - o.4 
|ll - 10.61= 0.4 

3,4 + 3*6 ■{• 1*4 1*4 *1* 0*6 v 0*6 *Sr 1*6 0*6 *<* 0*4 *1* 0*4 8 14 



1* « i 4 

10 l * 4 



This is the mean absolute deviation for the second set of numbers 



(groups of 20 drops)* 

From this method of comparison, which set of numbers seems to vary more? 



XV* The Method of Comparing Ranges* 
For the. first set of numbers 



2, 3, 3, 3, 4, 1, 2, 3, 3, 2 

the smallest is 1 and the largest is 4* The range , therefore, is 



4-1*3 



For the second set of numbers 



14, 7, 12, 12, 10, 10, 9, 10, 11, 11, 

the smallest is 7 and the largest is 14* The range , therefore is: 14 - 7 = 7. 
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From comparing the ranges , which set of numbers seems to show the greater 
variability? " ' " ~ ■' ' 

V« Tlie Method of Comparing Trimmed Ranges • 

To use this method, we arrange the numbers in order of size: 

1, 2, 2, 2, 3, 3 , 3, 3, 3, 4 

7, 9, 10, 10, 10, 11, 11, 12, 12, 14. 

We then "trim” each set by discarding (say) the two "largest” and 
the "smallest" numbers in each; 

2, 2, 3, 3, 3, 3 

10 , 10 , 10 , 11 , 11 , 12 

For these "trimmed" sets of numbers, we compute the ranges: 

3 - 2 = 1 trimmed range for first set of numbers (groups of 5) 

12 - 10 = 2 trimmed range for second set of numbers (groups of 20). 

3y using this method of comparison, which set of numbers seems to show 
greater variability? 

Question 3 : Which set of numbers, in your data, shows greater variability, 
the set recorded under C, or the set recorded under E? 

Question 4; Can you predict the total number of ”Ups" more accurately in 
small numbers of tosses, or in large numbers of tosses? 

Question 5 s If we want to get a set of numbers showing twice as much 
variability, should we use sample sizes twice as large? One-half as large? 
four times as large? One-fourth as large? Or whet? 

Question 6 : Do you know how mathematicians express tlie answer to Question 5? 

Question 7 ; What advantages and disadvantages can you find to help choose 
between the 5 different methods for comparing variability? 



Quest ion, u; Jerry says that , even though the second set of numbers seemed 
to show more variability, there is some sense in which it really shows less 

variability* What do you think? How would you suggest we deal with these 
two sets of .numbers? 
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Section XIX 

Proportional Occurrence of U's 

Question Is Ellen says that even though the total number of U's is harder 
to predict for larger samples, the proportional occurrence of U's is easier 
to predict fcr larger samples. What do you think? What does your data 
suggest? 

tost Ellen's suggestion by each of our 5 methods for comparing 
variability. In Section XX we compared the variability of the total number 

of U's. We now compare the variability of the proportional or fractional 
number of U's. 

Question 2s How do you expect the variability of the fractional number of 

^ the 5 dr f op case will compare with the fractional number of U's in 
the 20 drop case? 

Method X: 

% 

The fractional number of U's in the 5 drop case can be found by the 

total number of U's: 

3, 3, 3, 4, 1, 2, 3, 3, 2 

and dividing by the total number of drops (in this case, 5): 

s 

2* 2» 4, JL, 2, 3, 3, 2 

5555555555 

For the groups of 20 drops we get 

!£> io» £» io, u, n 

20 20 20 20 20 20 20 20 20 20 

Can you tell, by just looking, which set of numbers varies more? 

Method XT: Comparison by Graphs. 

We shall mark both sets of numbers on the same graph, using x's to 
indicate the 1st set, end 0's to indicate the 2nd set: 



2 . 







frequency 
■>2 occur- 
rence in 
ihe set of 
lumbers 
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^ 9- 




O — & — Gf-&- — ^ 
1ft 17 1 P» 19 . 






fractional number of U*s per group 



vmicll set of numbers shows greater consistency? Mich shows greater 
variability? Did it work out the way you expected? 



Method III: Comparison of Mean Absolute Deviation from the Sample Mean. 

Evidently, the mean for the 1st set of numbers must be 



2.6 

5 



0.52 



The absolute deviations from 0.52 are 
jo.6 - 0.52/- 0.12 
jo.6 - 0.52 1 = 0.00 
lO.fr- 0.52 ( - 6.08 
Jo.e - 0.52) = 0.00 






HSMBMdMsMaaSm 



’ 3 . 



/ 0,8 - 0,52 | * 0.20 
I 0,2 - 0.52 j = 0,32 
\0.4 - 0.52 1 = 0.12 
J0,6 - 0.52J = 0.08 
1 0,6 • 0,52 1 = OoOO 
1 0.4 - 0.52| * 0.12 

Averaging these deviations, we get 



0.12 * 0.08 * 0.03 *0.08 + 0.28 -Sr 0.32 * 0,12 l* 0 o C3 * 0,08 + 0.12 =* 1.36 

t 

** 0.136 ^ — This is the mean absolute deviation for the 1st set of 
numbers (group of 5 drops). 



We can novi do the same thing for the 2nd set ofj numbers (groups of 20 

drops) : 

The mean is 



10.6 

20 



0.53. 



The absolute deviations from this mean are: 

Jo .7 - 0.53| = 0.17 

|0.35 - 0*53/ = 0.18 

* > \ 

/0.6 - 0.53 1= 0.C-7 
/0.6 - 0.53 j * 0.07 
j 0.5 - 0.53 f » 0.03 
f 0.5 - 0.53/ * 0.03 
j 0.45 - 0.53] « 0.08 
1 0.5 - 0.53| - 0.03 

Jo.55 - 0.53J - 0.02 
jo. 55 - 0.53j - 0.02 
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0*17 * 0.13 * 0,07 * 0.07 * 0.03 •; 0.03 *0.08 + 0.03 + 0.02 + 0.02 = 0.7 



°zl 

To 



0,07 v — This is the mean absolute deviation for the 2nd set of 
numbers (group of 20 drops, using proportion of U's 
rather than total number of U's). 



Which set of numbers seems to vary more? How much more? 



XV. 



The Method of Comparing Ranges 

The 1st set of numbers (proportion of U's in group of 5 drops) is 
0.2, 0.4, 0.4, 0.4, 0.6, 0.6, 0.6, 0.6, 0.6, 0.8 
The smallest number is 0.2, and the largest is 0.8. Consequently, 



the range is 



0.8 - 0.2 « 0 . 6 . 



The 2nd set of numbers (proportion of U's in groups of 20 drops) is 
0.35 , 0.45 , 0.50 , 0.50 , 0.50 , 0.55 , 0.55 , 0..60, 0.60 , 0.70, 

The largest is 0.70, nnd the smallest is 0.35. Consequently the 



range is 0.70 - 0.35 ** 0,35, 



V. The Method of " Trimmed” Ranges 

For the 1st set of numbers, V7e delete the two largest and the two smallest. 



to get a "trimmed" set of numbers: 
0.4, 0.4, 0.6, 0.6, 0.6, 0.6. 



The range is now 0.6 - 0.4 * 0.2. 

lor the 2nd set of numbers (groups of 20 drops), if we omit the 2 largest 
and 2 smallest we get the "trimmed" set of numbers: 



0.50, 0.50, 0.50, 0.55, 0.55, 0.60 

The range of thic?*trir.ncd : set is 0.60 - 0.50 = 0,10. 

Question #3: Does the total number of U's vary more in large samples, or in 



•mall samples? 

Question #4: Does the proportion of U's vary more in large samples, or in 



small samples? 



Question: #5: Can you summarize what we have learned? What does your data 

seem to indicate? 
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Section IV 

Variability of total Humber of U's, and of proportion of U's, in large 
Samples and 1** Small Samples# 

(Summary of Sections I- III) 

Ales says that mathematicians talk about our thumb-tack experiment this ways 
When we were using 10 groups of 5 drops each, they would say we had a 

sample size n, equal to 5* 

When we were using 10 groups of 20 drops each, they would say we had 
a sample size n, equal to 20# 

In general, when we increase the sample size by making it 4 times as big, 
the variability of the total number of U's would he expected to increase by 
a factor of 2, Consequently, mathematicians say that the variability of 
the total number of U's increases as VS 7 

In the fractional proportion of U's, however, the situation is quite 
different. Here, if we multiply the sample size by 4, the variability of 
the fractional proportion of U's decreases by a factor of 2. Consequently, 
mathematici an s say that the fractional proportion of U's decreases like 

Question 1: Is this what your data seemed to indicate? 

Question 2: Could you come closer, in predicting the number of U's, from 

a small number of drops, ot from a large number of drops? 

Question 3 • Does your data become more variable or less variable, as you go 

to larger- sized samples? 

Question 4: Can you summarize what we have learned? 

Question 5: Why do you think we use fractions so much in the theory 'of ^ 
probability? , 
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Section V 

A Telephone Book Experiment 

Experiment II. Look at some ** randomly chosen* 1 page well into the phone 
book. Make a record of the last digit of the 1st 40 numbers of the page, 
grouping by fives as usual. Each student should collect this data inde— 
pendently, so that we can combine into a M big experiment .'* From this record. 



determine the frequency of occurrence of each digit, and the relative fre- 
quency of each. Study the variability of these frequencies as a function^ 

of sample size, as was done in Experiment I. 

Here is some typical data (although you will undoubtedly want to wo*.k 

with data collected by your own class) : 

Harold’s data: 

4, 9, 5, 6, 3 
6, 5, 4, 4 

2, 9, 4, 2 , 4 , 

8, 0, S, 4, 3 
4, 4, 0, 5, 6 , 

8 , 9 , 8 , 0 , 3 

0, 2, 0, 7, 3 ^ 

8, 1, 3, 4, 6 

Judy’s data; 

3, 3, 4, 7, 3 



5, 3, 0, 9, 6 
2, 8, 7, 9, 4 
C, 5, 9, 6, 4 
9, 5, 9, 3, 9 
8, 9, 9, 6, 7 
1,7, 7, 9, ? 
1 , 6 , 6 , 8 , 8 
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Bills data: 

0, 4, 6, 2, 3 
0) 2 , 5, 7, 2 
0, 0o 5, 5, 1 
0, 7, 5, 8, 9 
9, 0, 2, 4, 7 
7, 0, 8, 4, 1 
4, 2, 1, 4, 9 

0, 3, 1, 4, 4 

Occurrences of Digits in Each of 10, G roups of 20 



v ^ 



l 



( 



Digit total number of occurrences 



1 

2 

3 

4 

5 

6 



8 



1, 5, 1, 0, 1, 2, 5, 0, 5, 3 



0| 1} Qj 2, 2) 4, 1> 3j 1) 3 



3, 1> 1* 0, 2, 0, 1} 2, 2, 2 



2, 1> 3, 1* 3, 1* 2, 0, 1> 1 



5, 3, 3, 0, 3, 4, 2, 2, 1, 6 



3, l 9 2, 1, 0, 0, 3, 2, 4, 0 



2, 2, 2, 2, 1, 1, 0, 3, 1, 0 



0, 1, 2, 4, 4, 3, 1, 3, 2, 2 



2;, 3 a 3, 3, 1, 2, 4, 1, 1, 1 



2a 2) 3y 6a 3a 2 # la 2a 1* 2 



relative proportion of occurrences 



1, 5a 0» I» i ®a 5a 3 
20 20 20 20 20 20 20 20 



0 9 l 9 0 9 2 9 2 9 b 9 JL» 3* i» 3 
20 20 20 20 20 20 20 20 



3, 1, i, 6, I, 6, 1, 2, 2, 2 

20 20 20 20 20 20 20 20 



2, i, 3, 1, 1, 1, 2, 0, 1, 1 
20 20 20 20 20 20 20 20 20 



5, 3, 3, 0, 3 9 4, 2, 2, 1, 6 
20 20 *20 20 20 20 20 20 20 



3, 1, 2, 1, 0, 0, 3, 2, 4, 0 
20 20 20 20 20 20 20 



2a 2a 2a 2a ,1* ® 

20 20 20 20 20 20 20 20 



0 # lj 2 # 4 # 4 # 3, 3, 2, 2 

20 20 20 20 20 20 20 20 20 



2a 3a 3j _3 a l^a 2» i» 2L» 1 
20 20 20 20 20 20 20 20 20 20 



2a 2a 3a 6» 3; _2 > _1 a ,3* 2 

20 20 20 20 20 20 20 20 20 20 






. .. > * v 1 * ' ’ 



mm 



We can now test the suggestion that the variability of totals increases 
like Y'nT and the variability of fractional occurrences decreases like ^ , 

where n is the so-called " sample size.” 

We shall use three methods: the method of ranges, the method of trimmed 

ranges, and the method of average ranges. The first two of these methods we 
used in Experiment 1; the method of average ranges will, however, be new. 



Method of Ranges: For the total number of 


occurrences of 


the digit 0, in 


sample sizes of 5 (n=5), we have: 






0, 0, 0, 0, 


1, 1, 1, 1, If 2 




• 


Evidently, the range 


is 2-0=2, 


i 

t 




For the total number of occurrences of 


the digit 0, m 


1 sample sizes of 


20 (n=20), we haves 






•• 


0, 0, 1, 1, 


1, 2, 3, 5, 5, 5 


* 


• 


Evidently the range 


is 5-0=5. It is reasonably close to 

f 


our generalization 


r 

that, if we multiply the sample size by 4, 


we double the variability (in this 


case , we double^ the range) • 

Here are some further comparisons: 


• 




Total number of occurrences 


Fractional proportion of occurrence 


range for 
disit n » 5 


range for 
n = 20 


range for 
n = 5 


range for • 
n * 20 


0 2 


5 


0.4 


0.25 


1 1 


4 


0.2 

* 


0.2 


2 3 


3 


0.6 


0.15 


3 2 


3 


0.4 


0.15 


4 2 


6 


0.4 


0.3 


5 2 


3 


0.4 


0.15 


6 1 


3 ‘ 


0.2 


0.15 


7 1 


4 


0.2 


0.2 


8 2 


3 


0.4 


0.15 


9 2 


5 


0.4 


0.25 
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This table does not seem to show very decisive agreement without 
generalization about variability. What do your data show? 

Method of Trimmed Ranges r - 

If we delete the two largest and two smallest members from each set, we 






.1 




get 


trimmed set for total no. 


of occurrences 




digit 


group with n=5 


group with n=20 




0 


0, 0, 1, 1, 1, 1 


1, 1. 1, 2, 3, 5 




1 


0) 0) 0| 0| Og 0 


1, 1, 1. 2, 2, 3 




2 


0, 0, 0, 1, 1, 1 


1, 1, 1, 2, 2, 2 


♦ 


3 


0, 0, 0, 1, 1, 1 


1, 1, 1, 1, 2, 2 




4 


0,1,1, 1,1,1 


2, 2, 3, 3, 3, 4 




5 


o 

* 

o 

o 

* 

© 

* 


0, 1, 1, 2* 2, 3 


• 


6 


0, 0, 0, 1, 1, 1 


1, 1, 1, 2, 2, 2 


« 


7 


0, 0, 0, 0, 0, 0 


1, 2, 2, 2, 3, 3 


• 


C 


© 

o 

%» 

o 

w 

o 

* 

h* 


1, 1, 2, 2, 3, 3 


i 


9 


0| Og 0| 0| 1$ X 


2, 2, 2,2, 3, 3 


i 


For the trimmed ranges we get: 

trimmed range for total 


trimmed range for fractional 


digit 


no* of occurrences 


oroportion of occurrences 




n==5 


n=5 


r~20 


0 


1 * 


0.2. 


0.2 


1 


1 ... 0 2 


.0 


0.1 


2 


1 1 


0.2 


0.05 


3 


1 1 


0,2 


0,05 


4 


1 2 


0,2 


0.1 


5 


1 3 


0.2 


0.15 


6 


1 1 


0.2 


0.05 . 

i 

J 


7 


6 2 


0 


0.1 . 

V 

1 


8 


1 2 


o ;r 


_ '0.1 1 


9 


• 

1 1 


0.2 


0.05 




•* 0> . ****** ■ 
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Method of Average Range 



Combining our data for all digits, we can compute the average range and 
average trimmed -range as follows: 



Average 


range 


Average Trimmed 


range 


Total nOj of 
occurrences 


Proportional fraction 
of occurrences 


Total no. of 
occurrences 


Proportional 
fraction of 
occurrences 


.. . * 

ri-5 n=20 


n=5 : n=20 


n=5 r.“20 


n»5 n=20 


1*8 3*9 


0.36 .19 


0.8 1.9 


1.6 .09 



This table appears to fit in quite nicely with our generalization that the 

variability of {total number of occurrences increases like while the 

variability of the fractional proportion of occurrences decreases like Jl 

vn * 

as the sample size n increases * " • 

What do your data show? 
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Section VI 



An Experiment with a Coin 



Experiment III : Each member of your class can toss a coin 40 times*, recording 

t 

each occurrence of heads and tails in order* Keep these records in groups of 
5 tosses each* Keep this data permanentl y- — we can use it repeatedly in 
the future l You can study the variability of total number of heads, and 
fractional proportion of heads, as functions of the sample size n* 

What do you expect to find? Here is the record of 2,000 tosses of 

' 1 

U.S* coins: | 



BKKTH 

IITTH 

HHHII 

H H H T T 

Him 

H T H T H 

H E T T T 

H T T H T 

HTTIH 

HIIH7 50 

7HHHH 

HHHH fi 

HHTH1 

X T H H T 

TKTTI 

H T X H T 

HI1HH 

HHHHT 

H T T T II 

H T H H H 

T H H T T 

THtTH 

HHTHH 

H H II T H 
Him 200 



T H T T T 

HIGH 

T T T H II I 

H T H H T 

T H T T H 100 

HHITT 

T T T T T 

H T T T T 

T T H B T 

T T II T T 

H T T II T 

HHTT 

H T H T T 

T II H T II 

H T T T T 150 

T T T T H 

THTTII 

T T T H H 

T T T H H 

H T T T T 

3IIII 

H T T T T 

H T T II T 

HTTTE 
T H II T H 350 



1* You may want to get records of even more tosses; perhaps a total of at 
least 2,000. 



2 . 







! 



H 11 H T H 




T H E T T 


T H T T H 




T H T H T 


T T T H H 




H H T H H 


H T H T T 




T T T H T 


H H T H T 




TTHTH 


T T T H H 




HHHRH 


BIHHH 




HHHTH 


HIKHH 




1IRH1 


THTHH 




T H T T H 


H H T T H_ 


J50 


HHHH T f 

i 


THTHH 




HTHH'l 


T H H T H 




T H H H T 


7 7 7 T T 




T H T T T 


H H T T T 




T T T 11 T 


T T T H H 




H H H H H 


T H T T T 




HH11HH 


H T T H T 




HHHTH 


TITHT 




T T T H T 


T H T T T 




H T H H H 


T H H H H_ 


_300 


H T T H H 


TTHTH 




H B H H H 


T H H T T 




H H T H H 


T T T H H 




BTTHH 


IHHHH 




THTHH 


BTTHH 




T H H T T 


HHTHI 




H H T H H 


HE » TH 


* 


H H H T T 



T H T H H 
HHH1T 
T H H T T 



500 



400 



450 



r V 



(Section VI is temporarily left incomplete. Ill the completed version, 
one would treat this data as in the preceding sections, studying empirically 
the variability of totals and ratios as a function of sample size.) 

(This coin data would also be used later for an empirical comparison 
of the ’’compensation” vs. ” swamping” theories of the law of large numbers.) 
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Section VII. 



An Abstr a ct Model for Chance Events 



In the preceding sections, we have made empirical studies of variability 
using thumbtacks | telephone ’directories* and coins • We have seen that as we 
make our samples larger* the variability of the total number of occurrences 
of (say) an "Up"* or of a "head"* becomes larger* However, the fractional 
proportion of "Ups" or "heads" varies less for larger samples* 

Can we use this apparent stability of the fractional proportion of 
heads as the foundation for a mathematical model? Ue would like our model 
to help describe "chance" events* Let's see if we can make one that will 
have some usefulness* 

To begin with* let us think of the example the last digit of a 
telephone number 0 VJe can make a 2-dimencional graph by representing the 
possible outcomes along the horizontal axis 

/% 



4 — — ■ ■ — »— j — • > 

0123456789 



and representing the fractional proportion of occurrences along the vertical 
axis* Suppose* for 2 numbers* the last digit of one was 7* and of the other 
was 4* Then the fractional proportion of occurrences would be 



Digit 0123456789 

Fractional 0000^00^00 

Proportion 

of Occurrence 

and the corresponding graphical representation would be 



2 . 







3 . 



Kow what did we seen to be observing in our empirical studies of probability? 
For one thing* we computed the fractional average, not of a single toss, but 
cumulatively over many tosses . We took a fairly long section along the time axis 




and computed an average for all of the tosses included within this time interval. 
The resulting 2** dimensional graph might look like this; 



fractional 9 
proportion , , 

of occurrences 






M I ' 

If we take longer and longer sections along the time axis, the variability 
of the fractional proportion of occurences will become smaller and smaller. The 
fractions appear to be "homing in" on some constant values, from which they do not 

deviate vt.;y much in large samples# 

We might, then base our model upon the idea of a long** range average 

1 

1 

which can represent, as an average, an extended section along the time axis: 




o 

ERIC 



■L 



mm 



i 



*£2336 



*##**' vt***>»* i 



4 



Question I * what do you think a 2-dimensional "long-range average" would look 
like for: 

a) the thumbtack experiment 

b) the last digit of telephone numbers 

c) the coin- tossing experiments 

Question II* If you computed a 2-dimensional graph of fractional occurrences 
from a very long average along the time axis, would your 2-dimensional graph be 
relevant to some other long average along the time axis? 



We evidently can get slightly different, but quite similar, graphs by averaging 
over different long sections of the time axis* It is convenient to assumed "limiting’ 
graph towards which our long-range average graphs are tending* 

We can frequently use logical analysis to determine what this "limiting" graph 
should be* In the case of the coin-tossing experiment, we can argue that the coin 
Is reasonably symmetric, and so each side should be as likely as the other* Conse- 
quently, we can expect a "limiting: graph like this: 




X 






is 



H 



T 



Such logic, unfortunately, fails us in the case of the thumbtack, and we are 
forced to rely upon our long-range averages computed from empirical data* 

For the coin we have a good theory; for the thumbtacks we have none at all* 

The case of the last digit of the telephone numbers lies somewhere in between: we 

might believe that all digits are equally likely, on the grounds that the telephone 
company uses essentially consecutive numbers without gaps* On the other hand, It is 
harder to be sure just how telephone numbers are assigned, and so we are less con- 
fident that all digits really are equally likely* It is, however, possible to com- 
pare our "equally-likely" theoretical limit graph against graphs obtained empirically 
from long averages along the time axis. This comparison might be quite interesting. 
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We shall make one further modification of our 2-dimensional "limits" graph* 
The various outcomes of an experiment are usually things like "heads", "tails " 9 
"point-up", "point-downs, and so on* These outcomes do not naturally arrange 
themselves along a number line. We shall consequently dispense with the graphical 



arrangement t n 



•.J 



“''win uuj.oeives ''“I? Vith the set of possible outcomes. 



«*alch we 6hall call a sample space. 



Examples: 

1) If we toss a coin once, the set of possiblk outcomes (or "sample 

space”) might evidently be written ^H,T^ : : ; 

If we toss a single die, it can come tc rest showing, 1,2, 3, 4, 

5, or 6 Oil its uppermost face. We can represent this set of possible outcomes 

«S £l, 2, 3, 4, 5, 6 J . J : i : U : . J 

3) If we toss one dime and one queries, we can list the outcome in a 

definite order, giving the outcome for the dime first, then the outcome for 

- the quarter. Thus, HT would mean the dims showed heads, the quarter showed 
tails. Using this convention, the sample space might be written 

|{hh, HT, TH, TT ^ * 

4) If we throw two dice simultaneously, and care only about the total 
obtained by adding the two numbers on the uppermost faces, we might write 
the san pie space this way: 

^2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12^ . ... . 

5) For our thumbtack, the sample space might be written U, D t where 

"U" means the tack came to rest point-up, and "D» means that the tack came 
rest point-down* 
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We have replaced our horizontal axis by a simple listing of the possible 
outcomes of an experiment. We must, however, retain the numberal values which our 
2-dimensional limit graph exhibited along the vertical axis* We shall do this 
by means of a function f whose range is a subset of the set of real numbers. 

Examples: 

1) For our single coin experiment, the sample space is 

I 

and the function f is defined as 

f 00 * h 

f (T) * k | — 

* • 

2) For the thumbtack experiment, use your own data to determine f(U). 

i 

and f(d). Depending upon the kind of thumb-tack that you used, the surface onto 
v which it fell, and the method of dropping it, you may get different ratios of 
U's and D's* If, in a drops you got b U's and a - jb D's, then your estimated 
limit graph might result in this function: 

! £<p> - ~ 

Question III. Even without knowing the actual experiment and the actual sample 
space that someone has in mind, can you describe certain limitations on the 
function f which he must use? 








The Use of Tree Graphs 



The task of deciding upon a sample space is sometimes simplified by using 
a "'tree graph". We can illustrate fhis method by an example:* x \ 

Three-child families . To study the distribution of boys and girls* s" 
families having three children, a survey of such families is made. What Is 
a sample space for the experiment of drawing one family from a population 
of three-child families? We can construct a "tree Igraph” like this: 



1st (oldest) 
. child 



2nd child 



3rd child Sample 

I Space 




BBB 

BBG 

BGB 

BGG 

GBB 

GBG 

GGB 

GGG 



1) 

2) 



In the usual set notation, we could write the sample space as 
-^BBB, BBG, BGB, BGG, GBB, GBG, GGB, CGG^ 



Suggested continuation of Section VII 
Discuss "events" as subsets of the sample space. 

Describe the function f, extending its domain to the set of subsets of 
the sample space, include additive property. 



I. This example is quoted from Probability: A First Course, by Mosteller, 

Rourke and Thomas (Addison-Wesley, 1961), pp^ 64, 65. 
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